Commuting Patterns¶
Using meaningful locations queries¶
In this worked example we demonstrate the use of FlowKit to investigate commuting patterns. We will use meaningful_locations_aggregate queries to calculate subscribers' home and work locations, following methods developed by Isaacman et al. and Zagatti et al..
The Jupyter notebook for this worked example can be downloaded here, or can be run using the quick start setup.
Load FlowClient and connect to FlowAPI¶
We start by importing FlowClient. We also import geopandas and mapboxgl, which we will use later to to visualise the data.
import flowclient
import os
import numpy as np
import geopandas as gpd
import mapboxgl
from mapboxgl.utils import create_color_stops
- Visit the FlowAuth login page at http://localhost:9091.
- Log in with username
TEST_USERand passwordDUMMY_PASSWORD. - Under "My Servers", select
TEST_SERVER. - Click the
+button to create a new token. - Give the new token a name, and click
SAVE. - Copy the token string using the
COPYbutton. - Paste the token in this notebook as
TOKEN.
The steps are the same in a production setup, but the FlowAuth URL, login details and server name will differ.
Once we have a token, we can start a connection to the FlowAPI system. If you are connecting to FlowAPI over https (recommended) and the system administrator has provided you with an SSL certificate file, you should provide the path to this file as the ssl_certificate argument toflowclient.connect() (in this example, you can set the path in the environment variable SSL_CERTIFICATE_FILE). If you are connecting over http, this argument is not required.
conn = flowclient.connect(
url=os.getenv("FLOWAPI_URL", "http://localhost:9090"),
token=TOKEN,
ssl_certificate=os.getenv("SSL_CERTIFICATE_FILE"),
)
Create meaningful locations queries¶
We assign a day-of-week score of +1 to events which occur on weekdays (Monday-Friday), and a score of -1 to weekends (Saturday, Sunday). We assign an hour-of-day score of +1 to events during "working hours", which we define here as 08:00-17:00, and a score of -1 to evening hours 19:00-07:00. We then define two labels: we label locations with a positive hour-of-day score as "work", and locations with a negative hour-of-day score as "home".
tower_day_of_week_scores = {
"monday": 1,
"tuesday": 1,
"wednesday": 1,
"thursday": 1,
"friday": 1,
"saturday": -1,
"sunday": -1,
}
tower_hour_of_day_scores = [
-1, -1, -1, -1, -1, -1, -1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, -1, -1, -1, -1, -1
]
meaningful_locations_labels = {
"home": {
"type": "Polygon",
"coordinates": [[[-1, 1], [-1, -1], [-1e-06, -1], [-1e-06, 1]]],
},
"work": {
"type": "Polygon",
"coordinates": [[[0, 1], [0, -1], [1, -1], [1, 1]]],
},
}
meaningful_locations_aggregate function to create parameter dictionaries for two meaningful locations queries: a "home location", which will count the number of subscribers with "evening" locations in each level 3 adminstrative region, and a "work location", which will instead count "daytime" locations.
home_locations_spec = flowclient.aggregates.meaningful_locations_aggregate_spec(
start_date="2016-01-01",
end_date="2016-01-08",
label="home",
labels=meaningful_locations_labels,
tower_day_of_week_scores=tower_day_of_week_scores,
tower_hour_of_day_scores=tower_hour_of_day_scores,
aggregation_unit="admin3",
)
work_locations_spec = flowclient.aggregates.meaningful_locations_aggregate_spec(
start_date="2016-01-01",
end_date="2016-01-08",
label="work",
labels=meaningful_locations_labels,
tower_day_of_week_scores=tower_day_of_week_scores,
tower_hour_of_day_scores=tower_hour_of_day_scores,
aggregation_unit="admin3",
)
get_result function, to get the results of the queries as pandas DataFrames.
home_locations = flowclient.get_result(
connection=conn, query_spec=home_locations_spec
)
work_locations = flowclient.get_result(
connection=conn, query_spec=work_locations_spec
)
Visualise the distributions of home/work locations¶
We use the get_geography function to download the geography for the level 3 administrative regions.
# Download geography data as GeoJSON
regions = flowclient.get_geography(connection=conn, aggregation_unit="admin3")
# Create a geopandas GeoDataFrame from the GeoJSON
regions_geodataframe = gpd.GeoDataFrame.from_features(regions)
geoviews library for visualisation.
Note: Mapbox requires an access token, which should be set as the environment variable MAPBOX_ACCESS_TOKEN. Note that this is only required for producing the Mapbox visualisations, which is completely separate from FlowKit.
# Join location counts to geography data
locations_geodataframe = (
regions_geodataframe.drop(columns="centroid")
.join(
home_locations.drop(columns="label").set_index("pcod"),
on="pcod",
how="left",
)
.join(
work_locations.drop(columns="label").set_index("pcod"),
on="pcod",
lsuffix="_home",
rsuffix="_work",
how="left",
)
.fillna(value={"value_home":0, "value_work":0})
)
# Rename columns for map labels
locations_geodataframe = locations_geodataframe.rename(
columns={
"pcod": "P-code",
"value_home": "Total (home)",
"value_work": "Total (work)",
}
)
locations_to_show = "home" # "work"
mapbox_token = os.environ["MAPBOX_ACCESS_TOKEN"]
# Colour scale for legend
max_total = max([home_locations["value"].max(), work_locations["value"].max()])
color_stops = create_color_stops(np.linspace(0, max_total, 9), colors="YlGn")
locations_viz = mapboxgl.ChoroplethViz(
locations_geodataframe.__geo_interface__,
access_token=mapbox_token,
color_property=f"Total ({locations_to_show})",
color_stops=color_stops,
opacity=0.8,
line_color="black",
line_width=0.5,
legend_gradient=True,
legend_layout="horizontal",
legend_text_numeric_precision=0,
below_layer="waterway-label",
center=(84.1, 28.4),
zoom=5.5,
)
locations_viz.show()